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Abstract — Given the increasing popularity of algorithms for overlap- 
ping clustering, in particular in social network analysis, quantitative 
measures are needed to measure the accuracy of a method. Given a 
set of true clusters, and the set of clusters found by an algorithm, these 
sets of clusters must be compared to see how similar or different the sets 
are. A normalized measure is desirable in many contexts, for example 
assigning a value of where the two sets are totally dissimilar, and 1 
where they are identical. 

A measure based on normalized mutual information, 1 1 1, has recently 
become popular. We demonstrate unintuitive behaviour of this measure, 
and show how this can be corrected by using a more conventional 
normalization. We compare the results to that of other measures, such 
as the Omega index |2|. 

A C++ implementation is available online, 

In a non-overlapping scenario, each node belongs to exactly one 
cluster. We are looking at overlapping, where a node could belong to 
many communities, or indeed to no clusters. Such a set of clusters has 
been referred to as a cover in the literature, and this is the terminology 
that we will use. 

For a good introduction to our problem of comparing covers of 
overlapping clusters, see (2). They describe the Rand index, which is 
defined only for disjoint (non-overlapping) clusters, and then show 
how to extend it to overlapping clusters. Each pair of nodes is 
considered and the number of clusters in common between the pair 
is counted. Even if a typical node is in many clusters, it's likely that 
a randomly chosen pair of nodes will have zero clusters in common. 
These counts are calculated for both covers and the Omega index is 
defined as the proportion of pairs for which the shared-cluster-count 
is identical, subject to a correction for chance. 




Fig. 1. Mutual information and variation of information. The total 
information H(X, Y) = H(X\Y) + I(X : Y) + H(Y\X). 



If a + d = n, and therefore b = c = 0, then the two vectors are 
in complete agreement. 

The lack of information between two vectors is defined: 

HiXiM) =H(Xi,Y s ) - HQr-j) (1) 
—h(a, n) + h(b, n) + h(c, n) + h(d, n) (2) 
— h(b + d,n) — h(a + c, n) (3) 

where h(w,n) — — wlog 2 — 

There is an interesting technicality here. Imagine a pair of clusters 
but where the memberships have been defined randomly. There is a 
possibility that there will be a small amount of mutual information, 
even in the situation where the two vectors are negatively correlated 
with each other. In extremis, if the two vectors are near complements 
of each other, mutual information will be very high. We wish to 
override this and define that there is zero mutual information in this 
case. This is defined in equation (B.14) of UJ. We also use this 
restriction in our proposal. 
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I. Mutual information 

Meila [3] defined a measure based on mutual information for 
comparing disjoint clusterings. Lancichinetti et al. fJJ proposed a 
measure also based on mutual information, extended for covers. 
This measure has become quite popular for comparing community 
finding algorithms in social network analysis. It is this measure 
we are primarily concerned with there, and we will refer to it as 
NMIlfk after the authors' initials. 

We are proposing to use a different normalization to that used in 
NMIlfa', but first we will define the non-normalized measure which 
is based very closely on that in NMIlfk- You may want to compare 
this to the final section of Lancichinetti et al. [Tj- 

Given two covers, X and Y, we must first see how to measure the 
similarity between a pair of clusters. X and Y are matrices of cluster 
membership. There are n objects. The first cover has Kx clusters, 
and hence X is an n x Kx matrix. Y is an n x Ky matrix. Xi m 
tells us whether node m is in cluster i in cover X. 

To compare cluster i of the first cover to cluster j of the second 
cover, we compare the vectors Xi and Yj. These are vectors of ones 
and zeroes denoting which clusters the node is in. 

• a = ELi l x t™ = A y j™ = o] 

• b = Yil = i[Xim=0AY jrH = l] 

• C=Em =1 [^™ = lAl5m = 0] 



1 A Yi, 



1] 



H(Xi\Yj) if h(a,n) + h(d,n) > h{b,n) + h(c,n) 
h(c + d, n) + h(a + b, n) otherwise 

(4) 

This allows us to compare vectors Xi and Yj, but we want to 
compare the entire matrices X and Y to each other. We will follow 
the approximation used by fJJ here and match each vector in X to 
its best match in Y, 



H(Xi\Y)= min H^X^) 

]&{!,.. .Ky} 

then summing across all the vectors in X, 



H(X\Y) = J2 H ^\ Y ) 
ie{i,...K x } 



(5) 



(6) 



'https://github.com/aaronmcdaid/Overlapping-NMI 



H(Y\X) is defined in a similar way to H(X\Y), but with the 
roles reversed. 



II. Useful identities 

fig. [T] gives us an easy way to remember the following useful 
identities, which apply to any mutual information context. 
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H(X) =I(X : Y) + H(X\Y) 
H(Y) =I(X : Y) + H(Y\X) 
H(X,Y) =H(X)+H{Y\X) 
H(X, Y) =H(Y) + H(X\Y) 



mutual information 



variation of information 



H(X,Y)= I(X:Y) +H(X\Y)+H(Y\X) 

The first two equalities give us two definitions for the mutual 
information, I(X : Y). In theory, these should be identical, but due 
to the approximation used in eq. |5j they may be different. Therefore, 
we will use the average of the two. 



I(X : Y) 



[H(X) - H(X\Y) + H(Y) - H{Y\X)} (7) 



We are now ready to discuss normalization, contrasting the method 
of 1 1 1 with our alternative. 

Lancichinetti et al. 1 1 1 define their own normalization of the 
variation of information, 



+ 



H(Y\X) 



(8) 



(9) 



H(X) ' H(Y) 

and hence their normalized mutual information is 

NMT _ t 1 ( H{X\Y) H(Y\X) \ 
NMl LFK _ 1- _ [-j^y + ^f(YY) 

There are of course many ways to normalize a quantity such as the 
variation of information. Normalization typically involves division by 
a quantity c, 

H(X\Y) + H(Y\X) 

c(X,Y) UUJ 

where c is a function of X and Y which is guaranteed to be 
greater than or equal to the numerator. But NMlLFifdoes not use 
a normalization of this standard form, instead using eq. {8}. 

There is another aspect to the non-standard normalization used 
in NMI^fx; they insert an extra normalization factor into their 
definition of H(Xi\Yj), But this is not the root cause of the problems 
we will describe, hence we will not dwell on it. Our change is to 
remove all the normalization steps from their analysis and instead 
use a more conventional normalization of the form of eq. |10}. 

III. Unintuitive behaviour 

There are circumstances where NMIiFKOverestimates the similar- 
ity of two clusters. We will show how an alternative normalization 
will fix these problems. 

Imagine a cover X, and we are comparing it to a cover Y. Further, 
imagine Y has only one cluster {Ky = 1) and this cluster is identical 
to one of the clusters in X. For large Kx, we would expect the 
normalized mutual information to be quite low. An intuitive result 
would be approximately 

However, NM1lfk(X,Y) will be at least 0.5 in cases like this. 
This is because H(Y\X) will be zero bits (the single cluster in Y 
can be encoded with zero bits because it has a perfect match among 
the clusters of X) and this will result in a contribution of 0.5 to the 

NMlLFif- 

The other problematic example involves the power set. There are 
n objects in total. A cover involving every subset of the n objects 
will create 2" — 1 clusters; we will ignore the empty subset. This is 
the power set, which we denote as p(n). 

NM1lfk(X,p(ti)) will again be slightly greater than 0.5. This is 
because every cluster in X will have a perfect match in p(n) and 
this will result in H(X\p(n)) = 0. 



In both these examples NMIlfa' gives a score slightly above 0.5. 
The intuitive behaviour in these cases would be for a similarity score 
close to 0. We will demonstrate this behaviour in our experiments in 
section Ivl 

When we remove the normalization from NMIlfk, and instead 
use a more conventional normalization strategy eq. < | 1 0| > , we will find 
more intuitive behaviour. 

IV. NORMALIZATION 
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Fig. 2. As more communities are found, the scores of NMLif i^and NMI m(la; 
increase. For a small number of communities found, the intuitive result is a 
small value, and this is the behaviour of our proposed measure. 

Typically a normalization will involve a simple division of the 
absolute quantity by a quantity which is gauranteed to be an upper 
bound, giving us a number between zero and one. 

The following sequence of inequalities from Vinh et al. (4j provide 
possibilities for normalization. 

I(X : Y) <mm(H{X),H(Y)) 



<y/H(X),H(Y) 
<\{H{X) + H{Y)) 



(11) 



<max(H(X),H(Y)) 
<H(X,Y) 

Any of the five expressions on the right can be used, and [4| 
suggest a measure based on max.(H(X), H(Y)). The Normalized 
Information Distance is recommended 

d - 1 W 

where zero means perfect similarity and one means dissimilarity. 
We want a measure with the opposite behaviour, so we'll use the 
corresponding normalized mutual information 



NMI„ 



I(X : Y) 



max(H(X),H(Y)) 



(12) 



where I(X : Y) is as defined in eqs. Q to Q 

This can also be understood with reference to fig. [T] The problem 
with NMIlfk arises when one cover is more complicated than the 
other, for example if one cover has many more clusters than the other 
cover. This corresponds to one circle in fig. [T] being much larger than 
the other. In both the unintuitive examples mentioned in section [TTT] 
we will find that one of the circles will be much larger than the other 
and that the overlap between the two circles will be quite large, almost 
the full size of the smaller circle. As a result, one of the terms inside 
the brackets in eq. |9]( will be small and will bring the NMItF/cto 
0.5. 

V. EVALUATION 

See fig. [2] There are 200 nodes, divided into 20 communities. Each 
community has 10 nodes and they do not overlap. We fix one of our 
covers, X, to be the full set of twenty communities. Y contains a 
subset of these communities. As we go from left to right, the number 
of communities in Y increases from 1 to 20. 

The communities in Y are perfect copies of communities in X. 
Therefore, X = Y when all 20 communities are used. We see this 
in fig. [2] at the right, where both measures report an NMI of 1.0. 

This plot confirms the unintuitive behaviour of NMIlfk when few 
communities are found. On the left of the plot, when Y has only one 
community, the score is 0.5. 

The linear relationship of our NMI ma:r , going from to 1 as the 
number of communities in Y increases, is intuitive. 

VI. CONCLUSION 

We have identified unintuitive behaviour in the version of NMI 
proposed by 1 1 1 . We have identified the root cause of the behaviour 
and shown how the use of a conventional normalization can lead to 
more intuitive behaviour. 

A simple experiment was performed to confirm the existence of the 
unintuitive behaviour and demonstrate the more intuitive behaviour. 

There are a variety of normalized measures to measure the similar- 
ity of covers. There is no unique set of evaluation criteria to decide 
on the best, but we suggest that our measure is the most intuitive 
definition based on normalized mutual information. 
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